The influence of walking speed and effects of signal processing methods on the level of human gait regularity during treadmill walking

Background In recent years the use of sample entropy (SampEn) to evaluate the complexity of the locomotor system in human gait data has gained in popularity. However, it has been suggested that SampEn is sensitive to various input parameters and signal preprocessing methods. This study quantified the effects of different temporal and spatial normalization approaches and various lengths of the template vector (m) on SampEn calculations. The discriminatory ability of SampEn was studied by comparing two walking conditions. Methods Twenty-three participants (seven males, 55.7 ± 8.5 years, 165.7 ± 7.9 cm, 80.5 ± 16.7 kg) walked on a treadmill with preferred (Vpref) and maximum (Vmax) speed. Data were segmented and resampled (SEGM), resampled and spatially normalized (NORM), resampled and detrended (ZERO). Results For vertical ground reaction force (vGRF) and center of pressure in anterio-posterior direction (COPap), in both walking conditions, SampEn was generally sensitive to the vector length and not to the data processing, except for COPap in ZERO, m = 2, 4. For the COPml SampEn behaved oppositely, it was sensitive to preprocessing method and not to the m length. The regularity of COPap and vGRF in all processed signals increased in Vmax condition. For the COPml only two signals, WHOLE and ZERO, revealed increased complexity caused by more demanding walking conditions. Conclusions SampEn was able to discriminate between different walking conditions in all analyzed variables, but not in all signals. Depending on evaluated variable, SampEn was susceptible in different way for the m level and processing method. Hence, these should be checked and selected for each variable independently. For future studies evaluating influence of walking velocity on COP and vGRF regularity during treadmill walking it is advised to use raw time series. Furthermore, to maintain template vector which represents biological relevance it is advised to detect highest frequencies present in analyzed signals and evaluate minimal time interval which can reflect change caused by response of a neuromuscular system. During evaluating treadmill walking measured with 100 Hz sampling frequency it is recommended to adopt m from 6 to 10, when average stride time is up to about 1 s. Supplementary Information The online version contains supplementary material available at 10.1186/s13102-022-00600-4.


Introduction
The aim of this supplementary material was to investigate the influence of different r level on SampEn in each variable (COPml, COPap, vGRF) and each signal processing method (WHOLE, SEGM, ZERO, NORM) in different walking conditions (Vpref, Vmax). The present analyses included combination of previously selected m=6 and different r = 0.1, 0.15, 0.2, 0.25 and 0.3. The methodological procedure was the same as the main manuscript. Two-way ANOVA was used to compare the effect of tolerance level (r=0.1, 0.15, 0.2, 0.25, 0.3) and data processing method (Type=WHOLE, SEGM, NORM, ZERO) on the calculated sample entropy. For vertical ground reaction force (Fy) 'Type' factor had only two levels (WHOLE, SEGM). All analyses were done with previously selected vector length m=6.
In most cases parameters were normally distributed, in 6 (for 100) subgroups the normality assumption was violated, however skewness in this groups was about │0.5│ with two exceptions (r=0.1 COPap_Vmax_NORM and r=0.3 COPml_Vpref_NORM, accordingly skewness was equal 1 and 2.11). In r=0.2 all data were normally distributed. The F-test is said to be robust with respect to the assumption of normality and equality of variances so long as each group contains the same number of scores (Lindman 1974, Box 1954a, however, if assumption of homogeneity was violated the correlations between means and variances were inspected. The assumption of sphericity was assessed using Mauchly's test. When the assumption of uniformity was violated, an adjustment to the degrees of freedom of the F-ratio was made using Greenhouse-Geisser Epsilon, thereby making the F-test more conservative.         The results are not surprising, there was an overall tendency to a lower SampEn across the Fy and COPap/ml parameters with increasing r-values (more relaxed criteria for counting similar vectors). Results indicate that relations between signal types did not change with different r (Fig.3-6). Only for COPml in both walking conditions (Vpref and Vmax) there was a significant interaction Type*r effect (p <0.001) (Fig.1,2). In COPml, in Vpref condition, the post-hoc tests revealed that the relation between 'Types' differed with increasing r. In r=0.2 signal ZERO and WHOLE varied significantly and this difference disappeared in r=0.25 and r=0.3. The same happened with NORM and SEGM signal (Fig.1). Similar situation could be seen between r=0.2 and r=0.3 in COPml_Vmax (Fig.2).
To check whether discriminatory abilities of SampEn in COPml remained the same in r=0.25 and r=0.3 as in r=0.2, we conducted One-Way ANOVA with Repeated Measures. Post-hoc analysis revealed that in r=0.25 and r=0.3 only signal ZERO and WHOLE increased significantly during walking with maximum speed when compared to preferred speed (p<0.01) (Fig. 7,8). These results are consistent with outcomes obtained with r=0.2.
Results of aforementioned analysis indicates that using r=0.2 with our data is valid.
To check whether our parameters selection (m=6 and r=0.2) represents the acceptable length of the Confidence Interval (CI) around the SampEn, we have adopted method proposed by Lake et al. (2002), which penalizes conditional probability near 0 and near 1. Figure 9 shows that for m = 6 and r=0.2 the CI did not exceed the ̴ 10% of the SampEn (maximum relative error no higher than ̴ 0.023) Fig.A9 The mean value of a sample entropy (SampEn) efficiency metric for 23 COPap data records, schemed as a function of m and r. A value of 0.023 corresponds to a 95% confidence interval (CI) that is less than 10% of the SampEn estimate.